73 research outputs found

    Scheduling multiple divisible loads on a linear processor network

    Get PDF
    Min, Veeravalli, and Barlas have recently proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that their approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal schedule for any instance, once the number of installments per load is given. Then, we formally state that any optimal schedule has an infinite number of installments under a linear cost model as the one assumed in the original papers. Therefore, such a cost model cannot be used to design practical multi-installment strategies. Finally, through extensive simulations we confirmed that the best solution is always produced by the linear programming approach, while solutions of the original papers can be far away from the optimal

    Comments on "Design and performance evaluation of load distribution strategies for multiple loads on heterogeneous linear daisy chain networks''

    Get PDF
    Min, Veeravalli, and Barlas proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that the proposed approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal scheduling for any instance, once the number of installments per load is given. Finally, we formally prove that under a linear cost model, as in the original paper, an optimal schedule has an infinite number of installments. Such a cost model can therefore not be sed to design practical multi-installment strategies.Min, Veeravalli, and Barlas ont proposĂ© [8,9] des stratĂ©gies pour minimiser le temps d’exĂ©cution d’une ou de plusieurs tĂąches divisibles sur un rĂ©seau linĂ©aire de processeurs hĂ©tĂ©rogĂšnes, en distribuant le travail en une ou plusieurs tournĂ©es. Sur un exemple trĂšs simple nous montrons que l’approche proposĂ©e dans [9] ne produit pas toujours une solution et que, quand elle le fait, la solution est souvent sous-optimale. Nous montrons Ă©galement comment trouver un ordonnancement optimal pour toute instance, quand le nombre de tournĂ©es par tĂąches est spĂ©cifiĂ©. Finalement, nous montrons formellement que lorsque les fonctions de coĂ»ts sont linĂ©aires, comme c’est le cas dans [8,9], un ordonnancement optimal au n nombre infini de tournĂ©es. Un tel modĂšle de coĂ»t ne peut donc pas ĂȘtre utilisĂ© pour dĂ©finir des stratĂ©gies en multi-tournĂ©es utilisables en pratiqu

    Comments on "Design and performance evaluation of load distribution strategies for multiple loads on heterogeneous linear daisy chain networks''

    Get PDF
    Min, Veeravalli, and Barlas proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that the proposed approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal scheduling for any instance, once the number of installments per load is given. Finally, we formally prove that under a linear cost model, as in the original paper, an optimal schedule has an infinite number of installments. Such a cost model can therefore not be sed to design practical multi-installment strategies

    Regulation of hedgehog Ligand Expression by the N-End Rule Ubiquitin-Protein Ligase Hyperplastic Discs and the Drosophila GSK3ÎČ Homologue, Shaggy

    Get PDF
    Hedgehog (Hh) morphogen signalling plays an essential role in tissue development and homeostasis. While much is known about the Hh signal transduction pathway, far less is known about the molecules that regulate the expression of the hedgehog (hh) ligand itself. Here we reveal that Shaggy (Sgg), the Drosophila melanogaster orthologue of GSK3ÎČ, and the N-end Rule Ubiquitin-protein ligase Hyperplastic Discs (Hyd) act together to co-ordinate Hedgehog signalling through regulating hh ligand expression and Cubitus interruptus (Ci) expression. Increased hh and Ci expression within hyd mutant clones was effectively suppressed by sgg RNAi, placing sgg downstream of hyd. Functionally, sgg RNAi also rescued the adult hyd mutant head phenotype. Consistent with the genetic interactions, we found Hyd to physically interact with Sgg and Ci. Taken together we propose that Hyd and Sgg function to co-ordinate hh ligand and Ci expression, which in turn influences important developmental signalling pathways during imaginal disc development. These findings are important as tight temporal/spatial regulation of hh ligand expression underlies its important roles in animal development and tissue homeostasis. When deregulated, hh ligand family misexpression underlies numerous human diseases (e.g., colorectal, lung, pancreatic and haematological cancers) and developmental defects (e.g., cyclopia and polydactyly). In summary, our Drosophila-based findings highlight an apical role for Hyd and Sgg in initiating Hedgehog signalling, which could also be evolutionarily conserved in mammals

    Ordonnancement en régime permanent sur plates-formes hétérogÚnes

    No full text
    This thesis mainly deals with the mapping and the scheduling of applications on large heteroge- neous platforms. As the general scheduling problem is untractable, we consider two relaxations which apply to specific problems. Divisible load scheduling: Divisible loads are perfectly parallel applications, which can be split into chunks of arbitrary sizes to be distributed to many workers. We focus our attention on scheduling several divisible loads with different characteristics on linear networks of processors, in order to minimize the total processing time. This distribution may be done using several installments. Given a number of installments, we expose an algorithm giving an optimal dis- tribution of loads on processors, and we compare it to a pre-existing solution. Moreover, we show that any optimal distribution uses an infinite number of installments, leading to unfeasible solutions. This results also holds true for star-shaped platforms. Steady-state scheduling: In the second part, we discuss the issue of scheduling many copies of a given application, which is represented by a complex task graph. Instead of minimizing the completion time, we concentrate on the heart of the schedule and we try to maximize the throughput of the whole platform, without considering the start nor the end of our schedules. In this part, we first study the scheduling of complex but static applications, made of acyclic task graphs, on general heterogeneous platforms. To preserve a simple deployment of the application, produced schedules are made of a single allocation. Due to the NP-completeness of the problem, we not only provide an optimal solution, but also several heuristics returning efficient schedules. We compare our solutions to classical scheduling algorithms such as HEFT. In a second step, we focus on a collection of simpler but dynamic applications to schedule on fully heterogeneous master-workers platforms: the characteristics of their instances are varying. Designing static schedules taking care of this dynamicity is difficult, even in case of simple bag- of-tasks applications. Assuming that these variations are represented by random variables, we provide an Δ-approximation in clairvoyant context and efficient heuristics for both the semi- clairvoyant and non-clairvoyant cases. We present many simulations to assess their qualities compared to the Round-Robin or the On-Demand policies. In a third step, we deal with pipeline applications, of which several tasks are replicated on different processors to increase the global throughput. In this case, even if instances are dis- tributed in a simple Round-Robin fashion and if the mapping is completely specified, computing the throughput of the platform is difficult. We expose a model based on Timed Petri Nets to compute them; we also prove that the throughput can be computed in polynomial time for the Strict One-Port communication model. Finally, steady-state techniques are effectively used to schedule complex task graph on a hetero- geneous multi-core processor, the IBM Cell. We present a theoretical model of this processor and an efficient algorithm to schedule many instances of complex task graphs. An complete implementation of this algorithm shows strong performances, while actual throughputs are very close to those predicted by our solution.Les travaux prĂ©sentĂ©s dans cette thĂšse portent sur l'ordonnancement d'applications sur des plate- formes hĂ©tĂ©rogĂšnes Ă  grande Ă©chelle. Dans la mesure oĂč le problĂšme gĂ©nĂ©ral est trop complexe pour ĂȘtre rĂ©solu de façon exacte, nous considĂ©rons deux relaxations. TĂąches divisibles : La premiĂšre partie est consacrĂ©e aux tĂąches divisibles, qui sont des appli- cations parfaitement parallĂšles et pouvant ĂȘtre arbitrairement subdivisĂ©es pour ĂȘtre rĂ©parties sur de nombreux processeurs. Nous cherchons Ă  minimiser le temps de travail total lors de l'exĂ©cution de plusieurs applications aux caractĂ©ristiques diffĂ©rentes sur un rĂ©seau linĂ©aire de processeurs, sachant que les donnĂ©es peuvent ĂȘtre distribuĂ©es en plusieurs tournĂ©es. Le nombre de ces tour- nĂ©es Ă©tant fixĂ©, nous dĂ©crivons un algorithme optimal pour dĂ©terminer prĂ©cisĂ©ment ces tournĂ©es, et nous montrons que toute solution optimale requiert un nombre infini de tournĂ©es, rĂ©sultat restant vrai sur des plate-formes non plus linĂ©aires mais en Ă©toile. Nous comparons Ă©galement notre mĂ©thode Ă  des mĂ©thodes dĂ©jĂ  existantes. Ordonnancement en rĂ©gime permanent : La seconde partie s'attache Ă  l'ordonnancement de nombreuses copies du mĂȘme graphe de tĂąches reprĂ©sentant une application donnĂ©e. Au lieu de chercher Ă  minimiser le temps de travail total, nous optimisons uniquement le cƓur de l'or- donnancement. Tout d'abord, nous Ă©tudions des ordonnancements cycliques de ces applications sur des plate-formes hĂ©tĂ©rogĂšnes, basĂ©s sur une seule allocation pour faciliter leur utilisation. Ce problĂšme Ă©tant NP-complet, nous donnons non seulement un algorithme optimal, mais Ă©ga- lement diffĂ©rentes heuristiques permettant d'obtenir rapidement des ordonnancements efficaces. Nous les comparons Ă  ces mĂ©thodes classiques d'ordonnancement, telles que HEFT. Dans un second temps, nous Ă©tudions des applications plus simples, faites de nombreuses tĂąches indĂ©pendantes, que l'on veut exĂ©cuter sur une plate-forme en Ă©toile. Les caractĂ©ristiques de ces tĂąches variant, nous supposons qu'elles peuvent ĂȘtre modĂ©lisĂ©es par des variables alĂ©atoires. Cela nous permet de proposer une Δ-approximation dans un cadre clairvoyant, alors que l'ordonnan- ceur dispose de toutes les informations nĂ©cessaires. Nous exposons Ă©galement des heuristiques dans un cadre non-clairvoyant. Ces diffĂ©rentes mĂ©thodes montrent que malgrĂ© la dynamicitĂ© des tĂąches, il reste intĂ©ressant d'utiliser un ordonnancement statique et non des stratĂ©gies plus dynamiques comme On-Demand. Nous nous intĂ©ressons ensuite Ă  des applications, dont plusieurs tĂąches sont rĂ©pliquĂ©es sur plu- sieurs processeurs de la plate-forme de calcul afin d'amĂ©liorer le dĂ©bit total. Dans ce cas, mĂȘme si les diffĂ©rentes instances sont distribuĂ©es aux processeurs tour Ă  tour, le calcul du dĂ©bit est difficile. ModĂ©lisant le problĂšme par des rĂ©seaux de Petri temporisĂ©s, nous montrons comment le calculer, prouvant Ă©galement que ce calcul peut ĂȘtre fait en temps polynomial avec le modĂšle Strict One-Port. Enfin, le dernier chapitre est consacrĂ© Ă  l'application de ces techniques Ă  un processeur multi- cƓur hĂ©tĂ©rogĂšne, le Cell d'IBM. Nous prĂ©sentons donc un modĂšle thĂ©orique de ce processeur ainsi qu'un algorithme d'ordonnancement adaptĂ©. Une implĂ©mentation rĂ©elle de cet ordonnanceur a Ă©tĂ© effectuĂ©e, permettant d'obtenir des dĂ©bits intĂ©ressants tout en simplifiant l'utilisation de ce processeur et validant notre modĂšle thĂ©orique

    Ordonnancement en régime permanent sur plates-formes hétérogÚnes

    No full text
    This thesis mainly deals with the mapping and the scheduling of applications on large heteroge- neous platforms. As the general scheduling problem is untractable, we consider two relaxations which apply to specific problems. Divisible load scheduling: Divisible loads are perfectly parallel applications, which can be split into chunks of arbitrary sizes to be distributed to many workers. We focus our attention on scheduling several divisible loads with different characteristics on linear networks of processors, in order to minimize the total processing time. This distribution may be done using several installments. Given a number of installments, we expose an algorithm giving an optimal dis- tribution of loads on processors, and we compare it to a pre-existing solution. Moreover, we show that any optimal distribution uses an infinite number of installments, leading to unfeasible solutions. This results also holds true for star-shaped platforms. Steady-state scheduling: In the second part, we discuss the issue of scheduling many copies of a given application, which is represented by a complex task graph. Instead of minimizing the completion time, we concentrate on the heart of the schedule and we try to maximize the throughput of the whole platform, without considering the start nor the end of our schedules. In this part, we first study the scheduling of complex but static applications, made of acyclic task graphs, on general heterogeneous platforms. To preserve a simple deployment of the application, produced schedules are made of a single allocation. Due to the NP-completeness of the problem, we not only provide an optimal solution, but also several heuristics returning efficient schedules. We compare our solutions to classical scheduling algorithms such as HEFT. In a second step, we focus on a collection of simpler but dynamic applications to schedule on fully heterogeneous master-workers platforms: the characteristics of their instances are varying. Designing static schedules taking care of this dynamicity is difficult, even in case of simple bag- of-tasks applications. Assuming that these variations are represented by random variables, we provide an Δ-approximation in clairvoyant context and efficient heuristics for both the semi- clairvoyant and non-clairvoyant cases. We present many simulations to assess their qualities compared to the Round-Robin or the On-Demand policies. In a third step, we deal with pipeline applications, of which several tasks are replicated on different processors to increase the global throughput. In this case, even if instances are dis- tributed in a simple Round-Robin fashion and if the mapping is completely specified, computing the throughput of the platform is difficult. We expose a model based on Timed Petri Nets to compute them; we also prove that the throughput can be computed in polynomial time for the Strict One-Port communication model. Finally, steady-state techniques are effectively used to schedule complex task graph on a hetero- geneous multi-core processor, the IBM Cell. We present a theoretical model of this processor and an efficient algorithm to schedule many instances of complex task graphs. An complete implementation of this algorithm shows strong performances, while actual throughputs are very close to those predicted by our solution.Les travaux prĂ©sentĂ©s dans cette thĂšse portent sur l'ordonnancement d'applications sur des plate- formes hĂ©tĂ©rogĂšnes Ă  grande Ă©chelle. Dans la mesure oĂč le problĂšme gĂ©nĂ©ral est trop complexe pour ĂȘtre rĂ©solu de façon exacte, nous considĂ©rons deux relaxations. TĂąches divisibles : La premiĂšre partie est consacrĂ©e aux tĂąches divisibles, qui sont des appli- cations parfaitement parallĂšles et pouvant ĂȘtre arbitrairement subdivisĂ©es pour ĂȘtre rĂ©parties sur de nombreux processeurs. Nous cherchons Ă  minimiser le temps de travail total lors de l'exĂ©cution de plusieurs applications aux caractĂ©ristiques diffĂ©rentes sur un rĂ©seau linĂ©aire de processeurs, sachant que les donnĂ©es peuvent ĂȘtre distribuĂ©es en plusieurs tournĂ©es. Le nombre de ces tour- nĂ©es Ă©tant fixĂ©, nous dĂ©crivons un algorithme optimal pour dĂ©terminer prĂ©cisĂ©ment ces tournĂ©es, et nous montrons que toute solution optimale requiert un nombre infini de tournĂ©es, rĂ©sultat restant vrai sur des plate-formes non plus linĂ©aires mais en Ă©toile. Nous comparons Ă©galement notre mĂ©thode Ă  des mĂ©thodes dĂ©jĂ  existantes. Ordonnancement en rĂ©gime permanent : La seconde partie s'attache Ă  l'ordonnancement de nombreuses copies du mĂȘme graphe de tĂąches reprĂ©sentant une application donnĂ©e. Au lieu de chercher Ă  minimiser le temps de travail total, nous optimisons uniquement le cƓur de l'or- donnancement. Tout d'abord, nous Ă©tudions des ordonnancements cycliques de ces applications sur des plate-formes hĂ©tĂ©rogĂšnes, basĂ©s sur une seule allocation pour faciliter leur utilisation. Ce problĂšme Ă©tant NP-complet, nous donnons non seulement un algorithme optimal, mais Ă©ga- lement diffĂ©rentes heuristiques permettant d'obtenir rapidement des ordonnancements efficaces. Nous les comparons Ă  ces mĂ©thodes classiques d'ordonnancement, telles que HEFT. Dans un second temps, nous Ă©tudions des applications plus simples, faites de nombreuses tĂąches indĂ©pendantes, que l'on veut exĂ©cuter sur une plate-forme en Ă©toile. Les caractĂ©ristiques de ces tĂąches variant, nous supposons qu'elles peuvent ĂȘtre modĂ©lisĂ©es par des variables alĂ©atoires. Cela nous permet de proposer une Δ-approximation dans un cadre clairvoyant, alors que l'ordonnan- ceur dispose de toutes les informations nĂ©cessaires. Nous exposons Ă©galement des heuristiques dans un cadre non-clairvoyant. Ces diffĂ©rentes mĂ©thodes montrent que malgrĂ© la dynamicitĂ© des tĂąches, il reste intĂ©ressant d'utiliser un ordonnancement statique et non des stratĂ©gies plus dynamiques comme On-Demand. Nous nous intĂ©ressons ensuite Ă  des applications, dont plusieurs tĂąches sont rĂ©pliquĂ©es sur plu- sieurs processeurs de la plate-forme de calcul afin d'amĂ©liorer le dĂ©bit total. Dans ce cas, mĂȘme si les diffĂ©rentes instances sont distribuĂ©es aux processeurs tour Ă  tour, le calcul du dĂ©bit est difficile. ModĂ©lisant le problĂšme par des rĂ©seaux de Petri temporisĂ©s, nous montrons comment le calculer, prouvant Ă©galement que ce calcul peut ĂȘtre fait en temps polynomial avec le modĂšle Strict One-Port. Enfin, le dernier chapitre est consacrĂ© Ă  l'application de ces techniques Ă  un processeur multi- cƓur hĂ©tĂ©rogĂšne, le Cell d'IBM. Nous prĂ©sentons donc un modĂšle thĂ©orique de ce processeur ainsi qu'un algorithme d'ordonnancement adaptĂ©. Une implĂ©mentation rĂ©elle de cet ordonnanceur a Ă©tĂ© effectuĂ©e, permettant d'obtenir des dĂ©bits intĂ©ressants tout en simplifiant l'utilisation de ce processeur et validant notre modĂšle thĂ©orique

    Scheduling complex streaming applications on the Cell processor

    Get PDF
    International audienceIn this paper, we consider the problem of scheduling streaming applications described by complex task graphs on a heterogeneous multicore processor, the STI Cell BE processor. We first present a theoretical model of the Cell processor. Then, we use this model to express the problem of maximizing the throughput of a streaming application on this processor. Although the problem is proven NP-complete, we present an optimal solution based on mixed linear programming. This allows us to compute the optimal mapping for a number of applications, ranging from a real audio encoder to complex random task graphs. These mappings are then tested on two platforms embedding Cell processors, and compared to simple heuristic solutions. We show that we are able to achieve a good speed-up, whereas the heuristic solutions generally fail to deal with the strong memory and communication constraints

    Efficient Scheduling of Task Graph Collections on Heterogeneous Resources

    No full text
    In this paper, we focus on scheduling jobs on computing Grids. In our model, a Grid job is made of a large collection of input data sets, which must all be processed by the same task graph or workflow, thus resulting in a collection of task graphs problem. We are looking for a competitive scheduling algorithm not requiring complex control. We thus only consider single-allocation strategies. In addition to a mixed linear programming approach to find an optimal allocation, we present different heuristic schemes. Then, using simulations, we compare the performance of our different heuristics to the performance of a classical scheduling policy in Grids, HEFT. The results show that some of our static-scheduling policies take advantage of their platform and application knowledge and outperform HEFT, especially under communication-intensive scenarios. In particular, one of our heuristics, DELEGATE, almost always achieves the best performance while having lower running times than HEFT
    • 

    corecore